As we all know, with the advent of mobile internet access for all, the world has moved from MP3 downloaded songs to streaming services. A streaming service maybe paid/unpaid subscription that lets you stream songs with a stable internet connection on your device. A paid subscription also lets you download songs on to your device to be listened to anytime even with your internet connection.

Currently, Spotify and Apple Music are the top two streaming services available. Apple Music is primarily used by iOS/MacOS users while Spotify is used by any platform using internet services. Currently, as of 2020, Spotify has about 130 million subscribers compared to Apple Music with 75 million subscribers.

Hence, we can say Spotify is the leading music service by quite a distance. These music streams use curated playlists through various ML recommendation systems that keep feeding you with interesting songs and hence, keeping you interested in their services constantly.

Through this kernel, we would like to analyse the top 50 songs published by Spotify in 2019. We will check the various genres, popularity and some other features that made the songs to famous amongst the users.

1 Importing the relevant libraries and dataset

Code

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

Code

df=pd.read_csv('top50.csv',encoding='latin_1',index_col=0)
df.head()

	Track.Name	Artist.Name	Genre	Beats.Per.Minute	Energy	Danceability	Loudness..dB..	Liveness	Valence.	Length.	Acousticness..	Speechiness.	Popularity
1	Senorita	Shawn Mendes	canadian pop	117	55	76	-6	8	75	191	4	3	79
2	China	Anuel AA	reggaeton flow	105	81	79	-4	8	61	302	8	9	92
3	boyfriend (with Social House)	Ariana Grande	dance pop	190	80	40	-4	16	70	186	12	46	85
4	Beautiful People (feat. Khalid)	Ed Sheeran	pop	93	65	64	-8	8	55	198	12	19	86
5	Goodbyes (Feat. Young Thug)	Post Malone	dfw rap	150	65	58	-4	11	18	175	45	7	94

As we can see from the above graph above, we have various features such as Beats per min, genre, Energy, Danceability,Loudness, etc. to see how the songs stack up against each other. Since the data is quite detailed and clean, we will not try to wrangle with the data anymore. Hence, we shall directly head to data visualisation section.

2 Data Visualisation

2.1 a) Top Artists of 2019

Let us check which artists featured most frequently in the top 50 songs. We will try to obtain the top 10 most popular artists of 2019.

Code

df['Count']=1
df_artist=df.groupby('Artist.Name')['Count'].sum().reset_index().sort_values(by='Count',ascending=False)
df_artist.head(10)

	Artist.Name	Count
9	Ed Sheeran	4
19	Lil Nas X	2
32	Shawn Mendes	2
25	Marshmello	2
28	Post Malone	2
31	Sech	2
10	J Balvin	2
34	The Chainsmokers	2
4	Billie Eilish	2
2	Ariana Grande	2

Code

sns.catplot(x='Artist.Name',y='Count',data=df_artist.head(10),kind='bar',height=8,aspect=2,palette='winter')
plt.title('Top 10 artists of 2019',size=25)
plt.xlabel('Artist name',size=15)
plt.ylabel('Number of songs in top 50',size=15)
plt.xticks(size=15,rotation=45)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 [Text(0, 0, 'Ed Sheeran'),
  Text(1, 0, 'Lil Nas X'),
  Text(2, 0, 'Shawn Mendes'),
  Text(3, 0, 'Marshmello'),
  Text(4, 0, 'Post Malone'),
  Text(5, 0, 'Sech'),
  Text(6, 0, 'J Balvin'),
  Text(7, 0, 'The Chainsmokers'),
  Text(8, 0, 'Billie Eilish'),
  Text(9, 0, 'Ariana Grande')])

As we can see from the plot above, Ed Sheeran performed the best with 4 songs in the top 50 charts. The notable other artists with more than 1 song have been shown above. All other artists had not more than 1 song in the top 50.

2.2 b) Top Genres of 2019

Let us now check which genre of songs made it the most to the most to the top 50. This gives us a good idea of which genres are the most popular ones.

Code

df_genre=df.groupby('Genre')['Count'].sum().reset_index().sort_values(by='Count',ascending=False)
df_genre.head()

	Genre	Count
8	dance pop	8
15	pop	7
13	latin	5
10	edm	3
5	canadian hip hop	3

Let us visualise the above data into a pie plot as shown below.

Code

fig=px.pie(df_genre,values='Count',names='Genre',hole=0.4)
fig.update_layout(title='Top genres of 2019',title_x=0.45,annotations=[dict(text='Genre',font_size=20, showarrow=False,height=800,width=700)])
fig.update_traces(textfont_size=15,textinfo='percent')
fig.show()

2.3 c) Top 10 tracks of 2019

Let us check the top 10 tracks of 2019 based on popularity rating.

Code

df_top_tracks=df.sort_values(by='Popularity',ascending=False).head(10).reset_index(drop=True)
df_top_tracks.index=df_top_tracks.index + 1
df_top_tracks

	Track.Name	Artist.Name	Genre	Beats.Per.Minute	Energy	Danceability	Loudness..dB..	Liveness	Valence.	Length.	Acousticness..	Speechiness.	Popularity	Count
1	bad guy	Billie Eilish	electropop	135	43	70	-11	10	56	194	33	38	95	1
2	Goodbyes (Feat. Young Thug)	Post Malone	dfw rap	150	65	58	-4	11	18	175	45	7	94	1
3	Callaita	Bad Bunny	reggaeton	176	62	61	-5	24	24	251	60	31	93	1
4	Money In The Grave (Drake ft. Rick Ross)	Drake	canadian hip hop	101	50	83	-4	12	10	205	10	5	92	1
5	China	Anuel AA	reggaeton flow	105	81	79	-4	8	61	302	8	9	92	1
6	Ransom	Lil Tecca	trap music	180	64	75	-6	7	23	131	2	29	92	1
7	Otro Trago	Sech	panamanian pop	176	70	75	-5	11	62	226	14	34	91	1
8	Panini	Lil Nas X	country rap	154	59	70	-6	12	48	115	34	8	91	1
9	Piece Of Your Heart	MEDUZA	pop house	124	74	68	-7	7	63	153	4	3	91	1
10	Truth Hurts	Lizzo	escape room	158	62	72	-3	12	41	173	11	11	91	1

As we can see, the most popular song of 2019 is Bad Guy by Billie Eilish.

Let us check the technical aspects of the top 10 songs.

Code

sns.set()
df_top_tracks.plot.area(y=['Beats.Per.Minute', 'Energy',
       'Danceability', 'Loudness..dB..', 'Liveness', 'Valence.',
       'Acousticness..', 'Speechiness.'],alpha=0.4,figsize=(15,12),stacked=True)
plt.title('Technical variations of top 10 songs',size=20)
plt.xlabel('Track number')

Text(0.5, 0, 'Track number')

The track number matches with the dataframe shown above for top 10 songs.

Key takeaway

We can conclude that the :

Loudness variation was low
Variations in beat per min was high
Energy variation was fairly low
Danceability variation was low
Valence variation was high
Acousticeness variation was high
Speechiness variation was high

Let us try to visualise the distribution of some of the features using plots.

Code

sns.set(style='darkgrid')
fig2 = plt.figure(figsize=(10, 8))
ax1 = fig2.add_subplot(221)
sns.boxplot(x='Beats.Per.Minute', data=df_top_tracks, orient='v', ax=ax1)

ax2 = fig2.add_subplot(222)
sns.boxplot(x='Valence.', data=df_top_tracks, orient='v', ax=ax2, color='indianred')

ax3 = fig2.add_subplot(223)
sns.boxplot(x='Acousticness..', data=df_top_tracks, orient='v', ax=ax3, color='green')

ax4 = fig2.add_subplot(224)
sns.boxplot(x='Speechiness.', data=df_top_tracks, orient='v', ax=ax4, color='yellow')

plt.show()

These were the four features that had high variance. As a result, the inter quartile boxes are large. The horizontal lines on the boxes show the median values of each feature.

2.4 d) Correlation heatmap with popularity

Let us check how the various features are correlated with each other using a heatmap.

Code

plt.figure(figsize=(10, 8))
numeric_columns = df.select_dtypes(include=['float64', 'int64'])
corr_songs = numeric_columns.corr()
sns.heatmap(corr_songs, annot=True, fmt='.2f', cmap='rocket')
plt.show()

Key takeaway

From the heatmap above, following conclusions maybe drawn:

Speechiness and Beats per min have a good correlation.
Loudness and energy of the song have a high correlation.
Valence and energy also seem to have a good correlation.

2.5 e) Track length

Let us check how the track lengths are changing for the tracks.

Code

fig3=px.line(df,y='Length.',x='Track.Name',height=800, width=1000)
fig3.update_layout(title='Length of top songs',title_x=0.5,plot_bgcolor="black")

fig3.update_xaxes(visible=False)
fig3.show()

2.6 f) Correlation between energy and loudness

Let us check the correlation using a kernel density estimation plot.

Code

sns.jointplot(x='Loudness..dB..',y='Energy',data=df,kind='kde',color='green')

2.7 g) Correlation between BPM and Speechiness

Code

sns.jointplot(x='Beats.Per.Minute',y='Speechiness.',data=df,kind='kde',color='red')